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NOTES ON USE OF GENERALIZED ENTROPIES IN 

COUNTING 

ALEXEY E. RASTEGIN 


Abstract. We address an idea of applying generalized entropies in counting 
problems. First, we consider some entropic properties that are essential for 
such purposes. Using the a-entropies of Tsallis—Havrda—Charvat type, we 
derive several results connected with Shearer’s lemma. In particular, we derive 
upper bounds on the maximum possible cardinality of a family of fc-subsets, 
when no pairwise intersections of these subsets may coincide. Further, we 
revisit the Mine conjecture. Our approach leads to a family of one-parameter 
extensions of Bregman’s theorem. A utility of the obtained bounds is explicitly 
exemplified. 


1. Introduction 

The concept of entropy is fundamental in both statistical physics and information 
theory. It plays a certain role in applying information-theoretic ideas to combina¬ 
torial problems m- Many results of such a kind were reviewed by Radhakrishnan 
pT] and Galvin [10]. An entropy approach is often used in studies of colorings of 
graphs mmi]. Applications of the entropy as a combinatorial tool are typically 
based on the Shannon entropy and its conditional form. Meantime, other entropic 
functions have found to be useful in various questions |2]. The Renyi entropy |25j 
and the Tsallis-Havrda-Charvat (THC) entropy [Ml [29] are especially important 
extensions of the Shannon entropy. In principle, such entropic functions may have 
combinatorial or computational applications. For instances, they both have been 
used in global thresholding approach to image processing PTj. 

The main goal of this study is to address entropy-based approach to counting 
problems with use of the Tsallis-Havrda-Charvat entropies. The paper is organized 
as follows. In SectionjJl we recall properties of the THC entropies and prove a useful 
statement. In Section [Sj we obtain THC-entropy versions of some combinatorial 
results related to the so-called Shearer lemma. In particular, we consider an upper 
estimate for the maximum possible cardinality of a family of fc-subsets of the given 
set, when subsets obey certain restrictions. In Section [d] we derive one-parameter 
family of upper bounds on permanents of square (0, l)-matrices. This family is 
an extension of the Bregman theorem. We describe an example of utility of the 
presented extension. 

2. Definitions and properties of the THC ci-entropies 

In this section, we briefly recall definitions of the Tsallis-Havrda-Charvat en¬ 
tropies and related conditional entropies. Required properties of these entropic 
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functionals are discussed as well. Let discrete random variable X take values on 
the finite set VLx- The non-extensive entropy of strictly positive degree a ^ 1 is 
defined by [H] 


( 2 . 1 ) 


H^{X) := 


1 


1 — a 


E ■ 


With the factor (2^ “ — l) ^ instead of (1 — a) this entropic form was considered 
by Havrda and Charvat |14| . In non-extensive statistical mechanics, the entropy 
(EB is known as the Tsallis entropy. It is instructive to rewrite m as 

(2.2) HaiX) = - E P(®)“ lnaP(a:) = E 

x^Clx 

Assuming ^ > 0, the so-called a-logarithm is defined as 


p{x) 


(2.3) 


lna(6 = 


1-a ’ 

In^, 


for a > 0 , a ^ 1, 
for 0 = 1 . 


In the limit a —>■ 1, the entropy EB gives the standard Shannon entropy 
(2.4) Hi{X) = - E P(^) lnp(a;) . 


For all real q G [0,1], we write the binary THC entropy 

(2.5) ha{q) ■■= - q°‘ lnQ,(g) - (1 - q)°‘ ln„(l - q) . 

For q S (0,1), this function is concave and obeys ha{q) = ha{l — q). The THC 
entropies succeed some natural properties of the Shannon entropy. The maximal 
value of ()2.1D is equal to In^ |Hx| and reached with the uniform distribution. For 
a > 1, the joint THC entropy of two random variables obeys [9] 

(2.6) iL„(A, Y) < H^{X) + H^{Y) . 


In applications of information-theoretic methods, the notion of conditional en¬ 
tropy is widely used [^. Let us put the particular functional 

Hi{X\y) = - E Pi.Ay) lnp(a;|j/) , 

in which the sum is taken over x G fix- The entropy of X conditional on knowing 
Y is defined as [5] 

(2.7) iLi(A|r):=E P{y) Hi{X\y) = E Pi^^v) ^^Pi^ly) , 

where p{x\y) = p{x,y)/p(jj). When the range of summation is clear from the 
context, we will omit symbols such as fix and fly- 

In the literature, two kinds of the conditional THC entropy have been discussed 
[9]. These forms are respectively inspired by the two expressions given in (j2.2l) . 
The first form is defined as [9] 

(2.8) H^{X\Y) := E piyTH^iXly) , 
where 

(2.9) Hc{X\y) := {^^PiAyT “ l) = “ E,,lnaP(a;|j/) 
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and strictly positive a ^ 1. The conditional entropy (12. 8p is, up to a factor, the 
quantity originally introduced by Daroczy [7]. For any a > 0, this conditional 
entropy obeys the chain rule written as [7] 

(2.10) H^(X, V) = Hc(XlY) + H^(r) . 

Due to nonnegativity of Ha{X\Y), for all a > 0 we also have 

HUX,Y)>H^(Y) . 

The chain rule (I2.10p can be extended to more than two variables. Up to reordering 
of random variables, this result is expressed as [9] 

E Yh 

H^{Xj\X,Xi) , 

where a > 0. In the case a = 1, we obtain the chain rule with the standard 
conditional entropy (lO) . This property turns to be very essential in entropic 
approach to counting. The second form of conditional THC entropy is introduced 
as [S] 

(2.12) H^iX\Y) := ^ p{y) H4X\y) . 

Although the quantity (12.121) does not share the chain rule, it has found use in 
some questions 0122]. Its definition is based on the formulation, which seems to 
be more appropriate in the context of dynamical systems and generalized entropy 
rates [5JI21111H]- We also have Ha{X\Y) < Ha{X\Y) for a S (0,1) and Ha{X\Y) > 
Ha(X\Y) for a G (l,oo). For a = 1, the a-entropies (12.81) and (12.121) both coincide 
with (12.71) . 

Using entropic approach in counting, several properties of the conditional entropy 
are required. One of these properties is the chain rule. The standard conditional 
entropy also satisfies 

(2.13) 7Ji(x|yi,...,y„_i,r„)<iJi(A|yi,...,y„_i). 

Thus, conditioning on more can only reduce the conditional entropy. This relation is 
sometimes required in counting |21] . Another very useful property of the standard 
conditional entropy is formulated as follows. Let Y i—>■ f{Y) be some function, 
whose domain covers the support of random variable Y. Then we have m 

(2.14) H,{X\f(Y)) > H,{X\Y) . 

We shall now establish analogous properties for the conditional a-entropies. 

Proposition 1. Let X and Yi,... ,y„ be discrete random variables, where n > 1. 
For a > 1, the conditional entropy (12.81) satisfies 

(2.15) iJ„(x|yi,...,y„_i,r„)<iJ„(x|Yi,...,y„_i). 

For a > 0, the conditional entropy (12.121) satisfies 

(2.16) ij„(x|yi,...,y„_i,Y„)<ij„(x|Yi,...,Y„_i). 

Let Y I—!> f{Y) be a function of random variable Y. For a > 1, the conditional 
entropy p.8l) satisfies 

(2.17) H^{X\f{Y)) > Ha,{X\Y) . 

For a > 0, the conditional entropy (I2.12|) satisfies 

(2.18) H^XlfiY)) > H^iX\Y) . 
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Proof. The results (j2.15p and (I2.16p were proved in and [25, respectively. 

Let us proceed to (12.1711 and (12.181) . Since the standard case is known, we assume 
af^\. To each value u of the function, we assign the subset C fly such that 

uju.= {y ■■ y G Lly, f{y) = u} . 

Then the probabilities are written as 

(2.19) p{x,u) = ^ p{x,y) , p{u) = ^ p{y) . 

The left-hand side of (12.171) is represented as 

(2.20) H^{X\f{Y)) = PiuTHo^iXlu) . 


Replacing p{u)°‘ with p{u), we obtain the expression for Ha{X\f{Y)). For strictly 
positive a 1 and ^ > 0, we introduce the function 

^ 

VaiO = 

In terms of this function, we now write 


1 — a 


p{u)°‘HaiX\u) = ^ p{u)°‘Pa{pix\u)) . 
xGO.x 


As < 0 for the considered values of a, the function ^ >->• pa{0 is concave. For 
fixed X and u, we put numbers Xy = p{y)/p{u) and = p{x, y)/p{y) = p{x\y) such 


that 




p{u) 

p{u) 


= 1 , 




P[x,u) 

p(u) 


p{x\u) , 


according to (|2.19p . By Jensen’s inequality, we then obtain 


(2.21) p{u) 'qa{p{x\u)) >p{u) ^ Xy'qai.iy) = X! P^y) 'na{p^^\y)) ' 

(2.22) piuY yo,{p{x\u)) > p(u)“ ^ Xyp^ify) = X! P{'u)°‘~^P{y) ya{p{x\y)) . 

Summing (12.211) with respect to x G fix, for all the considered values of a one gets 


p{u)HaiX\u) > ^ p{y)Ha{X\y) . 

The latter leads to (12.181) after summing with respect to u G L1 /(y). For all y G a;„ 
and q; > 1, we have p{u) > p{y) and p{u)'^~^p{y) > p(]j)°^■ Combining this with 
(12.2211 and summing with respect to x G flx) we obtain 


p(u)“iL„(A|u) > ^ p(y)“iL„(A|y) . 

Summing this with respect to u G fl/(y) completes the proof of (I2.17F □ 


Note that the standard case a = 1 of (|2.17ll and (I2.18P can be proved by repeating 
the above reasons with the concave function ^ i—> —^ In^. In the mentioned ranges 
of the parameter, the conditional THC entropies (12.81) and (12.121) enjoy the property 
with respect to conditioning on more. The result (12.1511 has allowed to derive the 
one-parametric extension of entropic Bell inequalities originally given in |3]. Using 
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(12.141) , one can deduce a property useful in entropic approach to Bregman’s theorem 
[20l[2T]. We shall now formulate a similar statement for the a-entropies. 


Corollary 2. Let the support fly of random variable Y be partitioned into m 
mutually disjoint sets uij as 

m 

fly = OJj . 

1=1 


Let vuj C fix be defined as 

zuj ;= {a; : x G Qx, y G ujj, pix\y) ^ O} . 
If zuj 7 ^ vjk for all j ^ k, then 


(2.23) 

(2.24) 


Ha{X\Y) < ^Pr[r e ln„ \wj\ 
1=1 
m 

H^{X\Y) < ^Pr[y e ujj] ln„ \w,\ 
1=1 


(1 < a < oo) , 


(0 < Of < oo) . 


Proof. Let us take the function Y i-A fui{Y) such that fuj{y) = ruj for each y G ojj. 
It then follows from (12.171) and (12.181) that 

m 

(2.25) H^{X\Y)<H^{X\f^{Y))=Y,Pr[Y Gu:j]<^Ho,{X\w,) (1 < a < oo) , 

1=1 

m 

(2.26) iL„(X|r)<i?„(X|/<,(r)) =^Pr[rGw,]7L„(X|w,) (0<a<oo). 

1=1 

The quantity Ha[X\wj) is represented as the sum 

(2.27) Ha{X\wj) = H r/c(p(ai|w,)) . 

The sum of p{x\u!j) over x G Wj is equal to 1, whence the term (12.271) does not 
exceed lua \ wj\. Combining this fact with (12.251) and (12.261) completes the proof. □ 

Using Corollary [21 we will obtain upper bounds on conditional a-entropies in 
some combinatorial problems. To do so, we have to estimate not only cardinalities 
\wj\, but also probabilities Pr[F G ujj\. From this viewpoint, the inequality (12.241) 
seems to be more appropriate. 


3. Shearer’s lemma and intersections of /c-element sets 

In this section, we will examine some questions connected with the Shearer 
lemma [5]. The properties of the THC entropies lead to a lot of inequalities with 
interesting combinatorial applications. We first note the following. 

Proposition 3. Let X = {Xi, ..., Xn) be a random variable taking values in the set 
S = Si X ■ ■ ■ X Sm where each coordinate Xj is a random variable taking values in 
Sj. For all a > 1, we have 

n 

(3.1) H^{X)<'£Ho.{X,) . 

i=i 

Proof. The claim (EH) immediately follows by induction from (12.61) . □ 
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The result m is a straightforward extension of proposition 15.7.2 of the book 
[I]. Hence, we can obtain several corollaries. The first of them is posed as follows. 

Corollary 4. Let be a family of subsets of the set {1,... ,n}, and let qj denote 
the fraction of members of J- that contain j. For all a > 1, we have 

n 

(3.2) . 

i=i 

Proof. To each set F G P, we assign its characteristic vector v{F), which is a 
binary n-tuple. Let X = {Xi,..., X^) be the random variable taking values in 
{0,1}" such that 

(3.3) Pr[X = u(F)] = 'iFGF, 

whence Ha{X) = Ina |J^|. The random variable Xj takes values in {0,1} and is 
j-th value in the characteristic vector. By definition of qj, the entropy of Xj is 
equal to ha{qj). Combining this with (j3.ll) completes the proof. □ 

The result (13.21) provides an upper estimate for the maximum possible cardinality 
of a family of subsets. It is an a-entropy version of the basic lemma proved in 
m- The authors of m used tools of information theory for studying a family 
of fc-subsets, which satisfy some restrictions. We will further apply (lOl to a 
specific family of /c-element subsets of the set (1,... ,n}. Suppose that a family 
G = {Gi,..., Gm} of m subsets of the set (1,..., n} obey the implication 

(3.4) ^ {s,tr “G, nc, ^ G^nGt” . 

That is, no pairwise intersections of the fc-subsets may coincide. We aim to estimate 
cardinality of this family from above. Let us begin with an auxiliary result. 

Lemma 5. For a G [1,3.67], the function A i-A ha{X^)/\ is concave for A G [0,1/v^ ■ 

Proof. We left out the case a = 1, for which the concavity was reported in [T^] 
for all A G [0,1]. During the proof, we will use the following generalization of 
Bernoulli’s inequality (see, e.g., section 2.4 of the book m)- For one 

has 


(3.5) (1 + a;)*" > 1 + rx (r^[0,1]), 

(3.6) (1 + a;)’’ < 1 + ra; (0 < r < 1) . 


For a > 1, we can write the expression 

(3.7) (a - 1) = _a2“-i 

A 


1- (1- A2)“ 
A 


The term — A^“ ^ is concave with respect to A for all a > 1. We will show concavity 
of the second term in the right-hand side of (EZD. Let us use the second derivative 
test. For arbitrary function ^ >->• f{£), one has a general expression 


(3.8) ^ ^ (2^V"(0 - + /(O) , 

where ^ = A^. Substituting fa{f,) = 1 — (1 — ^)“ finally gives 


( 3 . 9 ) 2em) - ifLio+MO = 1 + (1 - ). 
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The coefficients in (j3.9|l are calculated as 

1/3 

(3.10) Cl = 2 — a , C 2 = —1 + 3a — 2a^ = - — 2 I a — - 

8 \ 4 

We will show that the quantity (13.91) is not positive for a G [1,3.67] and A G 
[O, l/v^ ■ Let us consider separately the cases a G [1, 2] and a G [2, 3.67]. 

Taking the intervals a G [1,2] and ^ G [0,1], we have 

(3.11) (l-6^"“<l-(2-a)/ = l-ci^ (l<a<2). 



This formula is based on (13.61) with x = —^ and r = 2 — a. Due to (I3.11L we 
rewrite (13.9|) in the form 

(3.12) (1 - 0“-' ((1 - 0'““ - 1 + Cl/ + c^e) < (1 - /)“-'c 2 f < 0 , 

where C 2 < 0 for a G [1,2] by (13.101) . Here, the concavity takes place for all 
Ag [0,1]. 

The case a > 2 is more complicated to analysis. Here, we introduce the positive 
parameter /3 = a — 2. The condition of negativity of ( 1 ^ is then rewritten as 

(l-/)^(l+/3/ + 7/")-l=:F^(/)>0, 


where 7 = —C 2 = 3 + 5/3 + 2/l^. This inequality is to be proved for / = A^ < 1/2. 

We begin with the case (3 G [0,1]. Using the polynomial pp{^) = 1 + /3/ + 7 /^, 
we write the derivative 

^ = (1 - 0^-^ ((1 - OMO - Pvm) ■ 

Doing simple calculations, we easily obtain 

(3.13) (1 - OMO - PPpiO = e((27 - /3 - /3") - 7(2 + m) ■ 


As / > 0, the derivative dFp /d/ is not negative, whenever 
27 - /3 - /32 6 + 9/3 + 3/32 


(3.14) 


/< 


7 ( 2 + /3) (3+ 5/3+ 2 / 32 ) (2+ /3) 


For /3 G [0,1], the right-hand side of (13.141) monotonically decreases with /3 from 1 at 
/3 = 0 up to 0.6 at /3 = 1. The condition (13.141) is clearly satisfied for all / G [0,1/2]. 
Here, the function +s(/) does not decrease. Combining this with Fp{Q) = 0, we 
finally get Fp{^) > 0 for all /3 G [0,1] and / G [0,1/2]. 

For /3 > 1, we apply (1 —/)^ >1-/3/ due to p.5|) . Thus, the quantity of interest 
obeys 

FpiO > (1 - /3/)(l + /3/ + 7 f ) - 1 > (7 - /3")f - • 

The latter is not negative, whenever 7 — > / 37 /. Due to / < 1/2, we can focus 

on the inequality 2(7 — — /37 > 0 , or 


(3.15) 


6 + 7/3 - 3/3^ - 2^3 > 0 . 


Inspecting roots of the polynomial, the condition (13.151) holds for all /3 G [0,1.67], 
though we use it only for /3 G [1,1.67]. The latter completes the proof for a G 
[3,3.67]. □ 


We have shown concavity of the function A !->■ ha{)?)/X for a G [1,3.67] and 
A G [ 0 ,1/v^ ■ When a G [1, 2], the concavity actually holds for A G [0,1]. We now 
formulate the desired estimate as follows. 
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Proposition 6. Let G = {Gi, ■ • ■, Gm} be a family containing m fc-subsets of the 
set {1,... ,n}, and let Q satisfy the property (13.4|) . Let Xj denote a proportion of 
those members of G that contain j. If the precondition 

(3.16) A, < ^ 
holds for all j G {1,..., n}, then for all a G [1,3.67], 

(3.17) In.Hsttnp, A:=f;hA,. 

' T = 1 


Proof. Let us consider pairwise intersections of members of G- Each j G {1,..., n} 
will appear in a proportion 


97 



\2 _ 

J m _ 1 


In the ratio, the denominator gives the number of all pairwise intersections; the 
numerator is the number of those pairwise intersections that contain j. The binary 
entropy (12.5p is concave for q G (0,1) and reaches its maximum at the point q = 1/2. 
Hence, it does not decrease on (0,1/2). Then the precondition (13.161) provides 

ha((?;) < h^iXf) . 

Combining this with Corollary |4j for a > 1 we obtain 


(3.18) 


lUn 


t=i 


^ h»(Ag) 

k Aj 


We further use X)j=i ^j/k = 1 and concavity of the function A i-A ha{X^)/X. Com¬ 
bining (13.181) with the Jensen inequality completes the proof. □ 


We have obtained an implicit upper bound on m = jt/j in terms of fc = jG^ j 
and the average proportion A of sets containing a particular element. Our result 
is a parametric extension of one of the statements proved in m- It also differs 
in the following two respects. First, the precondition (13.161) is now imposed. On 
the other hand, the formula (13.171) is more explicit in the sense that no unknown 
asymptotically small terms appear. For the prescribed value of A G [O, l/v^, we 
could optimize a bound with respect to the parameter a. The authors of m also 
consider a family of fc-sets, in which the intersection of no two is contained in a 
third. Such estimates are connected with one of questions raised by Erdos. 

The statement of Proposition|3]allows a certain extension. In the case of Shannon 
entropies, extension of such a kind has been proved by Shearer [^. It is often 
referred to as the Shearer lemma [HHIIIII]. Its generalization in terms of the 
THC entropies is posed as follows. 


Proposition 7. Let X = {Xi, ..., Xn) be a random variable taking values in the set 
S = Si X ■■■ X Sn, where each coordinate Xj is a random variable taking values 
in Sj. For a subset I of { 1 , ... ,n}, let X{I) denote the random variable {Xj)j^i. 
Suppose that G is a, family of subsets of {1,..., n} and each j G {1,..., n} belongs 
to at least k members of G- For a > 1, we then have 

(3.19) kH^{X)< 

Geg 
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Proof. Following [21], we will apply the chain rule. Using (|2.11|1 . for a > 1 we have 

n 

Ha{X) = I {Xk : k < j)) , 

i=i 

H^{X{G)) = Y, HciXj \{Xk: kGG,k< j)) 

j^G 

(3.20) >YH4Xj\{Xk: k<j)) . 

j£G 

The step ()3.20ll follows from (I2.15L since any string {Xk : k < j) contains more 
elements than [Xk : k G G, k < j). Summing (13.201) with respect to all G S C/ 
gives 

n 

(3.21) ^ Hc.{X{G)) > kYHc.{Xj I {Xk : k < j)) , 

Gea 3=1 

because each j G {1 ,..., n} belongs to at least k members of Q. □ 

The statement of PropositionQis a THC-entropy extension of the Shearer lemma. 
A related geometric picture was described in |3T] . Interesting geometric applications 
are also discussed in [T]. An immediate consequence of (13.191) is posed as follows. 

Corollary 8. Let iV be a finite set, and let be a family of subsets of N. Let 
Q = {Gi,..., Gm} be a family of subsets of N such that each element of N appears 
in at least k members of Q. For each 1 < j < m, we define J-j := {FDGj : F G F}. 
For a > 1, we then have 

m 

(3.22) fcln„|J-| <^ln,|J-, | . 

3 = 1 

For a = 1, the formula (13.221) is reduced to a result originally proved in j^. Some 
applications of the latter were also described in Of course, applications of such 
a kind can further be considered on the base of (13.221) . In some cases, a family of 
one-parameter relations may give a stronger bound. An explicit example of this 
situation is the case of upper bounds on permanents of square (0, l)-matrices. 

4. Upper bounds on permanents of (0, 1)-matrices 

In this section, we will derive a family of one-parameter upper bounds on the per¬ 
manent of a square (0, l)-matrix. The well-known upper bound on permanents has 
been conjectured by Mine [18] and later proved by Bregman [4]. Bregman’s proof 
is based on the duality theorem of convex programming and properties of doubly 
stochastic matrices. A short elementary proof of this result was given by Schrijver 
[26] . Schrijver also mentioned an upper bound for permanents of arbitrary nonneg¬ 
ative matrices. A similar proof with randomization is explained in [T]. Developing 
an approach with randomization, Radhakrishnan presented an entropy-based proof 
PP] . Our aim is to study the question with use of the THC entropies. First, we 
recall preliminary facts. Let A = [[a(j, j)]] be a nonnegative n x n-matrix, and let 
Sn denote the set of all permutations on {1,..., n}. The permanent of A is defined 
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as 

n 

(4.1) per(A) := ^ ]^a(i,cr(i)) . 

(y^Sn * —1 

We further consider matrices with elements a{i,j) € {0,1}. By S' C 5'n,, we mean 
the set of permutations a such that a{i, (t(j)) = 1 for all* S (1,..., n}. It is obvious 
that per(A) = |S|. It is assumed that the matrix contain no rows of only zeros, 
since otherwise its permanent is certainly zero. We claim the following. 


Proposition 9. Let A be a n x n (0, l)-matrix with per(A) ^ 0, and let 7 ^ 0 be a 
number of ones in i-th row (i = 1,..., n). For all a > 1, the permanent of A obeys 
the inequality 

Th ^ Ti 

(4.2) ln„(per(A)) < ^ — ^lna(j) . 

i=i j=i 


Proof. Let cr be a random permutation chosen uniformly from S. We then have 
the value Ha{cr) = lna|S|, which coincides with the left-hand side of (14.211 . We will 
show that, for a > 1 , the entropy Haicr) does not exceed the right-hand side of 
(14.211 . Let us choose a random permutation t € Sn uniformly. Using the chain rule 
( 12 . fill , for each permutation r we can write 


H, 


(4.3) 

(4.4) 


,{a) = +Ha(a{T{2)) cr(r(l))) -H 

- \-Ha(a{T{n)) a{T{l)),...,a{T{n-l 

(cr(r(l))) -Hi/„(cr(T(2)) cr(T(l)))-(- 

+ Ha(a(T{n)) a(T{l)),...,a(T{n-l 


< Hr. 


Here, the second inequality holds for a > 1. To the given permutation r and index 
i € (1,..., nj, we assign the integer A:(r, i) S (1,..., n} such that 

k{T,i) := T~^(i) , o-(r(A:)) = a{i) . 

Summing (14.4p over all t G Sn, we further obtain 


(4.5) 


\Sn\Ha{a)< ^ |iLa(cr(r(l))^-I-7L„^cr(r(2)) cr(T(l))^ -I- 

- \-Ha(a{T{n)) cr(T(l)),...,cr(r(n-1) 

n 

XI cr('r(l)),---,cr('r(fc 


1 


T€.Sn 


At the last step, we gather the contributions of different a{i) separately. For the 
given a G S, T G 5'„, and i G (1,... ,n}, we define Ri{<j,T) to be the set of those 
column indices that differ from ct(t(1 )), ..., a{T{k — 1)) and give I’s in f-th row 
PP] . By definition of r^, we have |i?i((T, t)| < r^. Using (I2.24L we then rewrite 
(14.511 as 


\Sn\Ha{a) < 


E 


E 


E 


Pr 


i=l reSn j=l 


= j Ina(j) 
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Dividing this relation by |5'n| and taking into account the uniform distribution of 
T € Sn, vfe immediately obtain 


(4.6) 


H, 


^ =j Ina(j) 

^ (7,T L' ' J 

2=1 i = l 


We now recall a principal observation of [5D] that 


Pr 


\Ri{<y,r)\ =j 


Combining this with (14.6|) completes the proof. 


□ 


The statement of Theorem |9] leads to an one-parameter family of upper bounds 
on permanents. In the limit a —> 1+, the relation (14.211 leads to the previous result 

m 

n 

(4.7) per(A) < J|(ri!)^^’'‘ ■ 

i=l 

It was conjectured in [18] and then proved in several ways |4l[20l[26]. This result can 
naturally be reformulated as an upper bound on the number of perfect matchings 
in a bipartite graph nnns]. 

We now consider a significance of the one-parameter bound (14.21) . It is instructive 
to consider a concrete example. Let n x n-matrix A have elements a(l,j) = 1 for 
all j = 1,. .., n and a(i,j) = S(i,j) for i = 2,..., n. That is, our matrix is obtained 
from the identity n x n-matrix by filling its first row with ones. We then have 
per (A) = 1. On the other hand, one gives ri = n and r 2 = • ■ • = t™ = 1- Let us 
compare values of the bounds (14.21) and (14.71) . It is easy to apply (14.21) in the case 
a = 2, since ln2(5) = 1 — 1/^ due to (12.311 . For a = 2, the upper bound (14.21) gives 


By 'Hn, we denote the n-th harmonic number |13] . It is well known that the 
asymptotic expansion of this number for large n is written as |13j 

Hn = Inn -f 7 -I- 0 (l/n) , 

where 7 is the Euler-Mascheroni constant. From (14.8L we immediately obtain 

Substituting the same collection of numbers into (14.71) gives 
(4.10) per(A) < (n!)^/" = ^ 11 + O 

At the last step, we used the Stirling approximation. For sufficiently large n, the 
upper bound (14.9p is significantly stronger than (14.101) . On the other hand, both the 
bounds are very far from the actual value of permanent. Nevertheless, our example 
has shown a relevance of the result (|4.2I) proved for a > 1. 

We can further ask for extending bounds with values a G (0,1). The correspond¬ 
ing result can be obtained by an immediate extension of Schrijver’s proof [26] . We 
have the following statement. 




(4.8) 


1 - 


per(A) 


1 ^ / 1 

1 -- 


i=i 
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Proposition 10. Let A be a n x n (0, l)-matrix with per(A) ^ 0, and let ri ^ 0 he a 
number of ones in z-th row (z = 1,..., rz). For a G (0,1), the permanent of A obeys 
the inequality 

Rii) 

Proof. For convenience, we introduce the function 

where ^ > 0 and a G (0,1). For these values of a, the function ^ i—>■ ga(0 is convex. 
Due to the Jensen inequality, we have 

(4.12) i ■ 

V’" fc=l / ^ k=l 

We will prove (14.111) by induction on n. For n = 1, the result is trivial. Suppose 
that the claim is already proved for (n — 1) x (n — l)-matrices. The permanent of 
a n X rz-matrix can be decomposed as 

n 

(4.13) per(A) = ^ per(A(i,fc)). 

k^l 

a{i,k) — l 


Here, the submatrix A(z, k) is obtained from A by eliminating the z-th row and the 
fc-th column. Combining (14.121) with (14.131) gives 

(4.14) per(A) (-1) ln„|^—^ gc{per(A(i, fc)) | . 

a{i.,k) — l 


From the definition of the a-logarithm, we have the identity 

(4.15) Ina(rC) = Ina(r) -b Ina(^) . 

Summing (14.141) with respect to z G {1,..., rz}, we therefore obtain 


(4.16) 

(4.17) 


(4.18) 


per(A) |-^lna(rj) - zz In^ ^ 


per(A) 


n n 


E per(A(z, k)) (—1) Ino 

i—1 k—1 

a{i,k) — l 

= EE(-l)ln/ 


<y^jigoi 
1 \ 


^ per(A) ^ 


per(A(i,fc)) 


(tGS i—1 


per{A(t,cr(z))} 


To prove (I4.16L we used (14.151) and the relation ^ — X)r=i ’’i satisfied for a G 
(0,1). To justify (I4.18L we note the following fact. In the double sum (14.181) . the 
number of terms from any pair (z, k) equals the number of those a G S for which 
ct(z) = k. The latter number is per(A(z, fc)) for a(z, k) = 1, and zero otherwise. We 
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now apply the induction hypothesis to each per{ A(g a{i ))} in (j4.18|) . The left-hand 
side of (j4.16|) is no greater than 


EE E ;^Eim4)+ E 


1 


r^ — l 


cr^S i—1 I j—^ 

^ a{£,cr{i))—0 


re — 1 
^ j=l 

a(i,a{i)) = l 




rt — l 


(4.19) 

= EE 

cre5 (.^1 

(4.20) 

n 

= EE 

a-es e=i 


re 


^ a{e,(7{i))—0 

re 


^ re — 1 

i^e ^ 3=1 

a{e,a{i)) = l 


n -re 
re 


re-1 


Elna(j) + ^ InaO’) > . 


1 = 1 


1 = 1 


In the step (14.191) . we change an order of summation. The step (14.201) is posed as 
follows. First, the number of i such that I and a(l, cr(*)) = 0 is equal to {n — ri). 
Second, the number of i such that i ^ I and a[£,a{i)) = 1 is equal to {re — 1). 
These observations allow to compute the sums with respect to i and get (14.201) . 
Adding the term per(A) J2i<e<n "^ria{re) to both (|4.16p and (I4.20L we immediately 
obtain 


per(A) (-n) In^ 


per(A) 


< per(A)^ — ^ln„(j) . 


The latter completes the proof. 


□ 


In the limit a —>■ 1“, the result (14.111) leads to the previous result (14.71) . In this 
regard, it is a proper extension of (14.21) to the parameter range a G (0,1). Together, 
the bounds (14.21) and (14.111) cover all the values a > 0. 
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